Skip to content

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Jan 17, 2025

Summary:
This patch matches CUDA, moving the HIP compilation jobs to the new
driver by default. The old behavior will return with
--no-offload-new-driver. The main difference is that objects compiled
with the old driver are no longer compatible and will need to be
recompiled or the old driver used.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Jan 17, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-clang-driver

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch matches CUDA, moving the HIP compilation jobs to the new
driver by default. The old behavior will return with
--no-offload-new-driver. The main difference is that objects compiled
with the old driver are no longer compatible and will need to be
recompiled or the old driver used.

Currently, we only regress on one feature: surface and texture
registering in RDC mode. Those runtime functions require more
information than I can pack in the current struct. I don't know how
important that is, or if we could hack around it.


Full diff: https://github.com/llvm/llvm-project/pull/123359.diff

15 Files Affected:

  • (modified) clang/lib/Driver/Driver.cpp (+5-9)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+3-3)
  • (modified) clang/test/Driver/cl-offload.cu (+2-3)
  • (modified) clang/test/Driver/hip-gz-options.hip (-1)
  • (modified) clang/test/Driver/hip-invalid-target-id.hip (+2-2)
  • (modified) clang/test/Driver/hip-macros.hip (-3)
  • (modified) clang/test/Driver/hip-offload-arch.hip (+2-2)
  • (modified) clang/test/Driver/hip-options.hip (+1-5)
  • (modified) clang/test/Driver/hip-sanitize-options.hip (+1-1)
  • (modified) clang/test/Driver/hip-save-temps.hip (+6-6)
  • (modified) clang/test/Driver/hip-toolchain-device-only.hip (-4)
  • (modified) clang/test/Driver/hip-toolchain-mllvm.hip (-2)
  • (modified) clang/test/Driver/hip-toolchain-no-rdc.hip (+1-1)
  • (modified) clang/test/Driver/invalid-offload-options.cpp (+1-1)
  • (modified) clang/unittests/Tooling/ToolingTest.cpp (+3-3)
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 87855fdb799710..25b0afcbfe4081 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4360,14 +4360,9 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
 
   handleArguments(C, Args, Inputs, Actions);
 
-  bool UseNewOffloadingDriver =
-      C.isOffloadingHostKind(Action::OFK_OpenMP) ||
-      C.isOffloadingHostKind(Action::OFK_SYCL) ||
-      Args.hasFlag(options::OPT_foffload_via_llvm,
-                   options::OPT_fno_offload_via_llvm, false) ||
-      Args.hasFlag(options::OPT_offload_new_driver,
-                   options::OPT_no_offload_new_driver,
-                   C.isOffloadingHostKind(Action::OFK_Cuda));
+  bool UseNewOffloadingDriver = Args.hasFlag(
+      options::OPT_offload_new_driver, options::OPT_no_offload_new_driver,
+      C.getActiveOffloadKinds() != Action::OFK_None);
 
   // Builder to be used to build offloading actions.
   std::unique_ptr<OffloadingActionBuilder> OffloadBuilder =
@@ -5124,7 +5119,8 @@ Action *Driver::ConstructPhaseAction(
                    (TargetDeviceOffloadKind == Action::OFK_HIP &&
                     !Args.hasFlag(options::OPT_offload_new_driver,
                                   options::OPT_no_offload_new_driver,
-                                  C.isOffloadingHostKind(Action::OFK_Cuda))))
+                                  C.getActiveOffloadKinds() !=
+                                      Action::OFK_None)))
               ? types::TY_LLVM_IR
               : types::TY_LLVM_BC;
       return C.MakeAction<BackendJobAction>(Input, Output);
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 33f08cf28feca1..e2e33bbef2b972 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -5073,7 +5073,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
       (JA.isHostOffloading(C.getActiveOffloadKinds()) &&
        Args.hasFlag(options::OPT_offload_new_driver,
                     options::OPT_no_offload_new_driver,
-                    C.isOffloadingHostKind(Action::OFK_Cuda)));
+                    C.getActiveOffloadKinds() != Action::OFK_None));
 
   bool IsRDCMode =
       Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
@@ -5429,7 +5429,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
       if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) &&
           !Args.hasFlag(options::OPT_offload_new_driver,
                         options::OPT_no_offload_new_driver,
-                        C.isOffloadingHostKind(Action::OFK_Cuda)) &&
+                        C.getActiveOffloadKinds() != Action::OFK_None) &&
           !Triple.isAMDGPU()) {
         D.Diag(diag::err_drv_unsupported_opt_for_target)
             << Args.getLastArg(options::OPT_foffload_lto,
@@ -6907,7 +6907,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
     CmdArgs.append({"--offload-new-driver", "-foffload-via-llvm"});
   } else if (Args.hasFlag(options::OPT_offload_new_driver,
                           options::OPT_no_offload_new_driver,
-                          C.isOffloadingHostKind(Action::OFK_Cuda))) {
+                          C.getActiveOffloadKinds() != Action::OFK_None)) {
     CmdArgs.push_back("--offload-new-driver");
   }
 
diff --git a/clang/test/Driver/cl-offload.cu b/clang/test/Driver/cl-offload.cu
index b05bf3b97b7eb7..8f1200f1733597 100644
--- a/clang/test/Driver/cl-offload.cu
+++ b/clang/test/Driver/cl-offload.cu
@@ -18,11 +18,10 @@
 // CUDA-SAME: "-Weverything"
 // CUDA: link
 
-// HIP: "-cc1" "-triple" "x86_64-pc-windows-msvc{{.*}}" "-aux-triple" "amdgcn-amd-amdhsa"
-// HIP-SAME: "-Weverything"
 // HIP: "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-pc-windows-msvc"
 // HIP-SAME: "-Weverything"
-// HIP: {{lld.* "-flavor" "gnu" "-m" "elf64_amdgpu"}}
+// HIP: "-cc1" "-triple" "x86_64-pc-windows-msvc{{.*}}" "-aux-triple" "amdgcn-amd-amdhsa"
+// HIP-SAME: "-Weverything"
 // HIP: {{link.* "amdhip64.lib"}}
 
 // CMake uses this option when finding packages for HIP, so
diff --git a/clang/test/Driver/hip-gz-options.hip b/clang/test/Driver/hip-gz-options.hip
index 7425d5fa847b3f..7bce8d5f66eebb 100644
--- a/clang/test/Driver/hip-gz-options.hip
+++ b/clang/test/Driver/hip-gz-options.hip
@@ -11,4 +11,3 @@
 // CHECK-DAG: {{".*clang.*" .* "--compress-debug-sections=zlib"}}
 // CHECK-DAG: {{".*lld" .* "--compress-debug-sections=zlib"}}
 // CHECK-DAG: {{".*clang.*" .* "--compress-debug-sections=zlib"}}
-// CHECK: "--compress-debug-sections=zlib"
diff --git a/clang/test/Driver/hip-invalid-target-id.hip b/clang/test/Driver/hip-invalid-target-id.hip
index 555043facb2a35..94e6c4b8bfe0aa 100644
--- a/clang/test/Driver/hip-invalid-target-id.hip
+++ b/clang/test/Driver/hip-invalid-target-id.hip
@@ -4,7 +4,7 @@
 // RUN:   --rocm-path=%S/Inputs/rocm \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=NOPLUS %s
 
-// NOPLUS: error: invalid target ID 'gfx908xnack'
+// NOPLUS: error: unsupported HIP gpu architecture: gfx908xnack
 
 // RUN: not %clang -### --target=x86_64-linux-gnu \
 // RUN:   -x hip --offload-arch=gfx900 \
@@ -55,7 +55,7 @@
 // RUN:   --rocm-path=%S/Inputs/rocm \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=NOCOLON %s
 
-// NOCOLON: error: invalid target ID 'gfx900+xnack'
+// NOCOLON: error: unsupported HIP gpu architecture: gfx900+xnack
 
 // RUN: not %clang -### --target=x86_64-linux-gnu \
 // RUN:   -x hip --offload-arch=gfx908 \
diff --git a/clang/test/Driver/hip-macros.hip b/clang/test/Driver/hip-macros.hip
index 3b3afba0b18ca3..36e0f71bd6eff6 100644
--- a/clang/test/Driver/hip-macros.hip
+++ b/clang/test/Driver/hip-macros.hip
@@ -73,8 +73,6 @@
 // RUN: %clang -E -dM --offload-arch=gfx940 --cuda-device-only -nogpuinc -nogpulib \
 // RUN:   %s 2>&1 | FileCheck --check-prefixes=NOPTS %s
 // PTS-DAG: #define __HIP_API_PER_THREAD_DEFAULT_STREAM__ 1
-// PTS-DAG: #define __HIP_API_PER_THREAD_DEFAULT_STREAM__ 1
-// PTS-DAG: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
 // PTS-DAG: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
 // NOPTS-NOT: #define __HIP_API_PER_THREAD_DEFAULT_STREAM__
 // NOPTS-NOT: #define HIP_API_PER_THREAD_DEFAULT_STREAM
@@ -85,4 +83,3 @@
 // RUN:   %s 2>&1 | FileCheck --check-prefix=APPROX %s
 // NOAPPROX-NOT: #define __CLANG_GPU_APPROX_TRANSCENDENTALS__
 // APPROX: #define __CLANG_GPU_APPROX_TRANSCENDENTALS__ 1
-// APPROX: #define __CLANG_GPU_APPROX_TRANSCENDENTALS__ 1
diff --git a/clang/test/Driver/hip-offload-arch.hip b/clang/test/Driver/hip-offload-arch.hip
index dd65a0e103ec69..1af53baf63da74 100644
--- a/clang/test/Driver/hip-offload-arch.hip
+++ b/clang/test/Driver/hip-offload-arch.hip
@@ -4,5 +4,5 @@
 // RUN:   -nogpuinc -nogpulib \
 // RUN:   %s 2>&1 | FileCheck %s
 
-// CHECK: {{"[^"]*clang[^"]*".* "-target-cpu" "gfx1030"}}
-// CHECK: {{"[^"]*clang[^"]*".* "-target-cpu" "gfx1031"}}
+// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx1030"
+// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx1031"
diff --git a/clang/test/Driver/hip-options.hip b/clang/test/Driver/hip-options.hip
index 8c13137735fb91..35db01029feca9 100644
--- a/clang/test/Driver/hip-options.hip
+++ b/clang/test/Driver/hip-options.hip
@@ -83,10 +83,6 @@
 // RUN:   --cuda-gpu-arch=gfx906 -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=HIPTHINLTO %s
 
-// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
-// RUN:   --cuda-gpu-arch=gfx906 -fgpu-rdc -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=HIPTHINLTO %s
-
 // Ensure we don't error about -fwhole-program-vtables for the non-device offload compile.
 // HIPTHINLTO-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
 // HIPTHINLTO-NOT: "-cc1"{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-flto-unit"
@@ -122,7 +118,7 @@
 
 // Check -Xoffload-linker option is passed to lld.
 
-// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib --no-offload-new-driver \
 // RUN:   --cuda-gpu-arch=gfx906 -fgpu-rdc -Xoffload-linker --build-id=md5 %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=OFL-LINK %s
 // RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
diff --git a/clang/test/Driver/hip-sanitize-options.hip b/clang/test/Driver/hip-sanitize-options.hip
index d94cbdacdaeb3a..da3766790376d5 100644
--- a/clang/test/Driver/hip-sanitize-options.hip
+++ b/clang/test/Driver/hip-sanitize-options.hip
@@ -56,8 +56,8 @@
 // NORDC-NOT: {{"[^"]*lld(\.exe){0,1}".*}} "[[OUT]]" {{".*asanrtl.bc" ".*hip.bc"}}
 // NORDC: {{"[^"]*clang[^"]*".* "-triple" "x86_64-unknown-linux-gnu".* "-fsanitize=address"}}
 
-// RDC: {{"[^"]*clang[^"]*".* "-triple" "x86_64-unknown-linux-gnu".* "-fsanitize=address"}}
 // RDC: {{"[^"]*clang[^"]*".* "-emit-llvm-bc".* "-fcuda-is-device".* "-mlink-bitcode-file" ".*asanrtl.bc".* "-mlink-builtin-bitcode" ".*hip.bc".* "-fsanitize=address".*}} "-o" "[[OUT:[^"]*.bc]]"
+// RDC: {{"[^"]*clang[^"]*".* "-triple" "x86_64-unknown-linux-gnu".* "-fsanitize=address"}}
 // RDC-NOT: {{"[^"]*lld(\.exe){0,1}".*}} "[[OUT]]" {{".*asanrtl.bc" ".*hip.bc"}}
 
 // FAIL: AMDGPU address sanitizer runtime library (asanrtl) is not found. Please install ROCm device library which supports address sanitizer
diff --git a/clang/test/Driver/hip-save-temps.hip b/clang/test/Driver/hip-save-temps.hip
index 142c3f1611a360..2e8489f65d7e0b 100644
--- a/clang/test/Driver/hip-save-temps.hip
+++ b/clang/test/Driver/hip-save-temps.hip
@@ -1,31 +1,31 @@
 // -fno-gpu-rdc without -o with -c
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc --offload-arch=gfx900 -c %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc --offload-arch=gfx900 -c %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,NORDC %s
 
 // -fno-gpu-rdc without -o
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc --offload-arch=gfx900 %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc --offload-arch=gfx900 %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,NORDC,NOUT %s
 
 // -fno-gpu-rdc with -o
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc -o executable --offload-arch=gfx900 %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc -o executable --offload-arch=gfx900 %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,NORDC,WOUT %s
 
 // -fgpu-rdc without -o with -c
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc -fgpu-rdc --offload-arch=gfx900 -c %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc -fgpu-rdc --offload-arch=gfx900 -c %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,RDC,RDCC %s
 
 // -fgpu-rdc without -o
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,RDC,RDCL,NOUT %s
 
 // -fgpu-rdc with -o
 // UN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// UN:   -nogpuinc -o executable -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
+// UN:   --offload-new-driver -nogpuinc -o executable -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
 // UN:   FileCheck -check-prefixes=CHECK,RDC,RDCL,WOUT %s
 
 // -fgpu-rdc host object path
diff --git a/clang/test/Driver/hip-toolchain-device-only.hip b/clang/test/Driver/hip-toolchain-device-only.hip
index 12097819f66888..c0621854f17cea 100644
--- a/clang/test/Driver/hip-toolchain-device-only.hip
+++ b/clang/test/Driver/hip-toolchain-device-only.hip
@@ -21,7 +21,3 @@
 
 // CHECK: [[LLD]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
 // CHECK-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" [[OBJ_DEV_A_900]]
-
-// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
-// CHECK-SAME: "-targets={{.*}},hip{{.*}}-amdgcn-amd-amdhsa--gfx803,hip{{.*}}-amdgcn-amd-amdhsa--gfx900"
-// CHECK-SAME: "-input={{.*}}" "-input=[[IMG_DEV_A_803]]" "-input=[[IMG_DEV_A_900]]" "-output=[[BUNDLE_A:.*hipfb]]"
diff --git a/clang/test/Driver/hip-toolchain-mllvm.hip b/clang/test/Driver/hip-toolchain-mllvm.hip
index 33018cc398915b..bedb053b9006cc 100644
--- a/clang/test/Driver/hip-toolchain-mllvm.hip
+++ b/clang/test/Driver/hip-toolchain-mllvm.hip
@@ -30,13 +30,11 @@
 // CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // CHECK-SAME: {{.*}} "-target-cpu" "gfx803"
 // CHECK-SAME: {{.*}} "-mllvm" "-unroll-count=10" {{.*}}
-// CHECK: [[LLD:".*lld.*"]] {{.*}}"-m" "elf64_amdgpu"{{.*}} "-plugin-opt=-unroll-count=10"{{.*}} "-plugin-opt=-inline-threshold=100"
 
 // CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
 // CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // CHECK-SAME: {{.*}} "-target-cpu" "gfx900"
 // CHECK-SAME: {{.*}} "-mllvm" "-unroll-count=10" {{.*}}
-// CHECK: [[LLD:".*lld.*"]] {{.*}} "-plugin-opt=-unroll-count=10"{{.*}} "-plugin-opt=-inline-threshold=100"
 
 // NEG-NOT: {{".*opt"}}
 // NEG-NOT: {{".*llc"}}
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index 054db261d8e57e..25bd3d60d7cf80 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -36,7 +36,7 @@
 // RUN:   %t/a.o %t/b.o \
 // RUN: 2>&1 | FileCheck -check-prefixes=LKONLY %s
 
-// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN: %clang -### --target=x86_64-linux-gnu --no-offload-new-driver \
 // RUN:   --offload-arch=amdgcnspirv --offload-arch=gfx900 \
 // RUN:   %s -nogpuinc -nogpulib \
 // RUN: 2>&1 | FileCheck -check-prefixes=AMDGCNSPIRV %s
diff --git a/clang/test/Driver/invalid-offload-options.cpp b/clang/test/Driver/invalid-offload-options.cpp
index 48d5310538a3cf..a0b7f1bdbd3981 100644
--- a/clang/test/Driver/invalid-offload-options.cpp
+++ b/clang/test/Driver/invalid-offload-options.cpp
@@ -26,4 +26,4 @@
 // RUN:   --offload=amdgcn-amd-amdhsa --offload-arch=gfx900 %s \
 // RUN: 2>&1 | FileCheck --check-prefix=OFFLOAD-ARCH-MIX %s
 
-// OFFLOAD-ARCH-MIX: error: option '--offload-arch' cannot be specified with '--offload'
+// OFFLOAD-ARCH-MIX: error: option '--offload' cannot be specified with '--offload-arch'
diff --git a/clang/unittests/Tooling/ToolingTest.cpp b/clang/unittests/Tooling/ToolingTest.cpp
index 8cdfffb54390e0..8a581c0d5e07e2 100644
--- a/clang/unittests/Tooling/ToolingTest.cpp
+++ b/clang/unittests/Tooling/ToolingTest.cpp
@@ -436,9 +436,9 @@ TEST_F(CommandLineExtractorTest, AcceptOffloadingCompile) {
 TEST_F(CommandLineExtractorTest, AcceptOffloadingSyntaxOnly) {
   addFile("test.c", "int main() {}\n");
   const char *Args[] = {
-      "clang",         "-target",   "arm64-apple-macosx11.0.0",
-      "-fsyntax-only", "-x",        "hip",
-      "test.c",        "-nogpulib", "-nogpuinc"};
+      "clang",     "-target",  "arm64-apple-macosx11.0.0", "-fsyntax-only",
+      "-x",        "hip",      "--no-offload-new-driver",  "test.c",
+      "-nogpulib", "-nogpuinc"};
   EXPECT_NE(extractCC1Arguments(Args), nullptr);
 }
 

@llvmbot
Copy link
Member

llvmbot commented Jan 17, 2025

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
This patch matches CUDA, moving the HIP compilation jobs to the new
driver by default. The old behavior will return with
--no-offload-new-driver. The main difference is that objects compiled
with the old driver are no longer compatible and will need to be
recompiled or the old driver used.

Currently, we only regress on one feature: surface and texture
registering in RDC mode. Those runtime functions require more
information than I can pack in the current struct. I don't know how
important that is, or if we could hack around it.


Full diff: https://github.com/llvm/llvm-project/pull/123359.diff

15 Files Affected:

  • (modified) clang/lib/Driver/Driver.cpp (+5-9)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+3-3)
  • (modified) clang/test/Driver/cl-offload.cu (+2-3)
  • (modified) clang/test/Driver/hip-gz-options.hip (-1)
  • (modified) clang/test/Driver/hip-invalid-target-id.hip (+2-2)
  • (modified) clang/test/Driver/hip-macros.hip (-3)
  • (modified) clang/test/Driver/hip-offload-arch.hip (+2-2)
  • (modified) clang/test/Driver/hip-options.hip (+1-5)
  • (modified) clang/test/Driver/hip-sanitize-options.hip (+1-1)
  • (modified) clang/test/Driver/hip-save-temps.hip (+6-6)
  • (modified) clang/test/Driver/hip-toolchain-device-only.hip (-4)
  • (modified) clang/test/Driver/hip-toolchain-mllvm.hip (-2)
  • (modified) clang/test/Driver/hip-toolchain-no-rdc.hip (+1-1)
  • (modified) clang/test/Driver/invalid-offload-options.cpp (+1-1)
  • (modified) clang/unittests/Tooling/ToolingTest.cpp (+3-3)
diff --git a/clang/lib/Driver/Driver.cpp b/clang/lib/Driver/Driver.cpp
index 87855fdb799710..25b0afcbfe4081 100644
--- a/clang/lib/Driver/Driver.cpp
+++ b/clang/lib/Driver/Driver.cpp
@@ -4360,14 +4360,9 @@ void Driver::BuildActions(Compilation &C, DerivedArgList &Args,
 
   handleArguments(C, Args, Inputs, Actions);
 
-  bool UseNewOffloadingDriver =
-      C.isOffloadingHostKind(Action::OFK_OpenMP) ||
-      C.isOffloadingHostKind(Action::OFK_SYCL) ||
-      Args.hasFlag(options::OPT_foffload_via_llvm,
-                   options::OPT_fno_offload_via_llvm, false) ||
-      Args.hasFlag(options::OPT_offload_new_driver,
-                   options::OPT_no_offload_new_driver,
-                   C.isOffloadingHostKind(Action::OFK_Cuda));
+  bool UseNewOffloadingDriver = Args.hasFlag(
+      options::OPT_offload_new_driver, options::OPT_no_offload_new_driver,
+      C.getActiveOffloadKinds() != Action::OFK_None);
 
   // Builder to be used to build offloading actions.
   std::unique_ptr<OffloadingActionBuilder> OffloadBuilder =
@@ -5124,7 +5119,8 @@ Action *Driver::ConstructPhaseAction(
                    (TargetDeviceOffloadKind == Action::OFK_HIP &&
                     !Args.hasFlag(options::OPT_offload_new_driver,
                                   options::OPT_no_offload_new_driver,
-                                  C.isOffloadingHostKind(Action::OFK_Cuda))))
+                                  C.getActiveOffloadKinds() !=
+                                      Action::OFK_None)))
               ? types::TY_LLVM_IR
               : types::TY_LLVM_BC;
       return C.MakeAction<BackendJobAction>(Input, Output);
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 33f08cf28feca1..e2e33bbef2b972 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -5073,7 +5073,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
       (JA.isHostOffloading(C.getActiveOffloadKinds()) &&
        Args.hasFlag(options::OPT_offload_new_driver,
                     options::OPT_no_offload_new_driver,
-                    C.isOffloadingHostKind(Action::OFK_Cuda)));
+                    C.getActiveOffloadKinds() != Action::OFK_None));
 
   bool IsRDCMode =
       Args.hasFlag(options::OPT_fgpu_rdc, options::OPT_fno_gpu_rdc, false);
@@ -5429,7 +5429,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
       if (IsDeviceOffloadAction && !JA.isDeviceOffloading(Action::OFK_OpenMP) &&
           !Args.hasFlag(options::OPT_offload_new_driver,
                         options::OPT_no_offload_new_driver,
-                        C.isOffloadingHostKind(Action::OFK_Cuda)) &&
+                        C.getActiveOffloadKinds() != Action::OFK_None) &&
           !Triple.isAMDGPU()) {
         D.Diag(diag::err_drv_unsupported_opt_for_target)
             << Args.getLastArg(options::OPT_foffload_lto,
@@ -6907,7 +6907,7 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
     CmdArgs.append({"--offload-new-driver", "-foffload-via-llvm"});
   } else if (Args.hasFlag(options::OPT_offload_new_driver,
                           options::OPT_no_offload_new_driver,
-                          C.isOffloadingHostKind(Action::OFK_Cuda))) {
+                          C.getActiveOffloadKinds() != Action::OFK_None)) {
     CmdArgs.push_back("--offload-new-driver");
   }
 
diff --git a/clang/test/Driver/cl-offload.cu b/clang/test/Driver/cl-offload.cu
index b05bf3b97b7eb7..8f1200f1733597 100644
--- a/clang/test/Driver/cl-offload.cu
+++ b/clang/test/Driver/cl-offload.cu
@@ -18,11 +18,10 @@
 // CUDA-SAME: "-Weverything"
 // CUDA: link
 
-// HIP: "-cc1" "-triple" "x86_64-pc-windows-msvc{{.*}}" "-aux-triple" "amdgcn-amd-amdhsa"
-// HIP-SAME: "-Weverything"
 // HIP: "-cc1" "-triple" "amdgcn-amd-amdhsa" "-aux-triple" "x86_64-pc-windows-msvc"
 // HIP-SAME: "-Weverything"
-// HIP: {{lld.* "-flavor" "gnu" "-m" "elf64_amdgpu"}}
+// HIP: "-cc1" "-triple" "x86_64-pc-windows-msvc{{.*}}" "-aux-triple" "amdgcn-amd-amdhsa"
+// HIP-SAME: "-Weverything"
 // HIP: {{link.* "amdhip64.lib"}}
 
 // CMake uses this option when finding packages for HIP, so
diff --git a/clang/test/Driver/hip-gz-options.hip b/clang/test/Driver/hip-gz-options.hip
index 7425d5fa847b3f..7bce8d5f66eebb 100644
--- a/clang/test/Driver/hip-gz-options.hip
+++ b/clang/test/Driver/hip-gz-options.hip
@@ -11,4 +11,3 @@
 // CHECK-DAG: {{".*clang.*" .* "--compress-debug-sections=zlib"}}
 // CHECK-DAG: {{".*lld" .* "--compress-debug-sections=zlib"}}
 // CHECK-DAG: {{".*clang.*" .* "--compress-debug-sections=zlib"}}
-// CHECK: "--compress-debug-sections=zlib"
diff --git a/clang/test/Driver/hip-invalid-target-id.hip b/clang/test/Driver/hip-invalid-target-id.hip
index 555043facb2a35..94e6c4b8bfe0aa 100644
--- a/clang/test/Driver/hip-invalid-target-id.hip
+++ b/clang/test/Driver/hip-invalid-target-id.hip
@@ -4,7 +4,7 @@
 // RUN:   --rocm-path=%S/Inputs/rocm \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=NOPLUS %s
 
-// NOPLUS: error: invalid target ID 'gfx908xnack'
+// NOPLUS: error: unsupported HIP gpu architecture: gfx908xnack
 
 // RUN: not %clang -### --target=x86_64-linux-gnu \
 // RUN:   -x hip --offload-arch=gfx900 \
@@ -55,7 +55,7 @@
 // RUN:   --rocm-path=%S/Inputs/rocm \
 // RUN:   %s 2>&1 | FileCheck -check-prefix=NOCOLON %s
 
-// NOCOLON: error: invalid target ID 'gfx900+xnack'
+// NOCOLON: error: unsupported HIP gpu architecture: gfx900+xnack
 
 // RUN: not %clang -### --target=x86_64-linux-gnu \
 // RUN:   -x hip --offload-arch=gfx908 \
diff --git a/clang/test/Driver/hip-macros.hip b/clang/test/Driver/hip-macros.hip
index 3b3afba0b18ca3..36e0f71bd6eff6 100644
--- a/clang/test/Driver/hip-macros.hip
+++ b/clang/test/Driver/hip-macros.hip
@@ -73,8 +73,6 @@
 // RUN: %clang -E -dM --offload-arch=gfx940 --cuda-device-only -nogpuinc -nogpulib \
 // RUN:   %s 2>&1 | FileCheck --check-prefixes=NOPTS %s
 // PTS-DAG: #define __HIP_API_PER_THREAD_DEFAULT_STREAM__ 1
-// PTS-DAG: #define __HIP_API_PER_THREAD_DEFAULT_STREAM__ 1
-// PTS-DAG: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
 // PTS-DAG: #define HIP_API_PER_THREAD_DEFAULT_STREAM 1
 // NOPTS-NOT: #define __HIP_API_PER_THREAD_DEFAULT_STREAM__
 // NOPTS-NOT: #define HIP_API_PER_THREAD_DEFAULT_STREAM
@@ -85,4 +83,3 @@
 // RUN:   %s 2>&1 | FileCheck --check-prefix=APPROX %s
 // NOAPPROX-NOT: #define __CLANG_GPU_APPROX_TRANSCENDENTALS__
 // APPROX: #define __CLANG_GPU_APPROX_TRANSCENDENTALS__ 1
-// APPROX: #define __CLANG_GPU_APPROX_TRANSCENDENTALS__ 1
diff --git a/clang/test/Driver/hip-offload-arch.hip b/clang/test/Driver/hip-offload-arch.hip
index dd65a0e103ec69..1af53baf63da74 100644
--- a/clang/test/Driver/hip-offload-arch.hip
+++ b/clang/test/Driver/hip-offload-arch.hip
@@ -4,5 +4,5 @@
 // RUN:   -nogpuinc -nogpulib \
 // RUN:   %s 2>&1 | FileCheck %s
 
-// CHECK: {{"[^"]*clang[^"]*".* "-target-cpu" "gfx1030"}}
-// CHECK: {{"[^"]*clang[^"]*".* "-target-cpu" "gfx1031"}}
+// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx1030"
+// CHECK: "-cc1" "-triple" "amdgcn-amd-amdhsa"{{.*}}"-target-cpu" "gfx1031"
diff --git a/clang/test/Driver/hip-options.hip b/clang/test/Driver/hip-options.hip
index 8c13137735fb91..35db01029feca9 100644
--- a/clang/test/Driver/hip-options.hip
+++ b/clang/test/Driver/hip-options.hip
@@ -83,10 +83,6 @@
 // RUN:   --cuda-gpu-arch=gfx906 -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=HIPTHINLTO %s
 
-// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
-// RUN:   --cuda-gpu-arch=gfx906 -fgpu-rdc -foffload-lto=thin -fwhole-program-vtables %s 2>&1 \
-// RUN:   | FileCheck -check-prefix=HIPTHINLTO %s
-
 // Ensure we don't error about -fwhole-program-vtables for the non-device offload compile.
 // HIPTHINLTO-NOT: error: invalid argument '-fwhole-program-vtables' only allowed with '-flto'
 // HIPTHINLTO-NOT: "-cc1"{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-flto-unit"
@@ -122,7 +118,7 @@
 
 // Check -Xoffload-linker option is passed to lld.
 
-// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
+// RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib --no-offload-new-driver \
 // RUN:   --cuda-gpu-arch=gfx906 -fgpu-rdc -Xoffload-linker --build-id=md5 %s 2>&1 \
 // RUN:   | FileCheck -check-prefix=OFL-LINK %s
 // RUN: %clang -### --target=x86_64-unknown-linux-gnu -nogpuinc -nogpulib \
diff --git a/clang/test/Driver/hip-sanitize-options.hip b/clang/test/Driver/hip-sanitize-options.hip
index d94cbdacdaeb3a..da3766790376d5 100644
--- a/clang/test/Driver/hip-sanitize-options.hip
+++ b/clang/test/Driver/hip-sanitize-options.hip
@@ -56,8 +56,8 @@
 // NORDC-NOT: {{"[^"]*lld(\.exe){0,1}".*}} "[[OUT]]" {{".*asanrtl.bc" ".*hip.bc"}}
 // NORDC: {{"[^"]*clang[^"]*".* "-triple" "x86_64-unknown-linux-gnu".* "-fsanitize=address"}}
 
-// RDC: {{"[^"]*clang[^"]*".* "-triple" "x86_64-unknown-linux-gnu".* "-fsanitize=address"}}
 // RDC: {{"[^"]*clang[^"]*".* "-emit-llvm-bc".* "-fcuda-is-device".* "-mlink-bitcode-file" ".*asanrtl.bc".* "-mlink-builtin-bitcode" ".*hip.bc".* "-fsanitize=address".*}} "-o" "[[OUT:[^"]*.bc]]"
+// RDC: {{"[^"]*clang[^"]*".* "-triple" "x86_64-unknown-linux-gnu".* "-fsanitize=address"}}
 // RDC-NOT: {{"[^"]*lld(\.exe){0,1}".*}} "[[OUT]]" {{".*asanrtl.bc" ".*hip.bc"}}
 
 // FAIL: AMDGPU address sanitizer runtime library (asanrtl) is not found. Please install ROCm device library which supports address sanitizer
diff --git a/clang/test/Driver/hip-save-temps.hip b/clang/test/Driver/hip-save-temps.hip
index 142c3f1611a360..2e8489f65d7e0b 100644
--- a/clang/test/Driver/hip-save-temps.hip
+++ b/clang/test/Driver/hip-save-temps.hip
@@ -1,31 +1,31 @@
 // -fno-gpu-rdc without -o with -c
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc --offload-arch=gfx900 -c %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc --offload-arch=gfx900 -c %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,NORDC %s
 
 // -fno-gpu-rdc without -o
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc --offload-arch=gfx900 %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc --offload-arch=gfx900 %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,NORDC,NOUT %s
 
 // -fno-gpu-rdc with -o
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc -o executable --offload-arch=gfx900 %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc -o executable --offload-arch=gfx900 %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,NORDC,WOUT %s
 
 // -fgpu-rdc without -o with -c
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc -fgpu-rdc --offload-arch=gfx900 -c %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc -fgpu-rdc --offload-arch=gfx900 -c %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,RDC,RDCC %s
 
 // -fgpu-rdc without -o
 // RUN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// RUN:   -nogpuinc -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
+// RUN:   --no-offload-new-driver -nogpuinc -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
 // RUN:   FileCheck -check-prefixes=CHECK,RDC,RDCL,NOUT %s
 
 // -fgpu-rdc with -o
 // UN: %clang -### --target=x86_64-linux-gnu -nogpulib -save-temps \
-// UN:   -nogpuinc -o executable -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
+// UN:   --offload-new-driver -nogpuinc -o executable -fgpu-rdc --offload-arch=gfx900 %s 2>&1 | \
 // UN:   FileCheck -check-prefixes=CHECK,RDC,RDCL,WOUT %s
 
 // -fgpu-rdc host object path
diff --git a/clang/test/Driver/hip-toolchain-device-only.hip b/clang/test/Driver/hip-toolchain-device-only.hip
index 12097819f66888..c0621854f17cea 100644
--- a/clang/test/Driver/hip-toolchain-device-only.hip
+++ b/clang/test/Driver/hip-toolchain-device-only.hip
@@ -21,7 +21,3 @@
 
 // CHECK: [[LLD]] "-flavor" "gnu" "-m" "elf64_amdgpu" "--no-undefined" "-shared"
 // CHECK-SAME: "-o" "[[IMG_DEV_A_900:.*out]]" [[OBJ_DEV_A_900]]
-
-// CHECK: [[BUNDLER:".*clang-offload-bundler"]] "-type=o"
-// CHECK-SAME: "-targets={{.*}},hip{{.*}}-amdgcn-amd-amdhsa--gfx803,hip{{.*}}-amdgcn-amd-amdhsa--gfx900"
-// CHECK-SAME: "-input={{.*}}" "-input=[[IMG_DEV_A_803]]" "-input=[[IMG_DEV_A_900]]" "-output=[[BUNDLE_A:.*hipfb]]"
diff --git a/clang/test/Driver/hip-toolchain-mllvm.hip b/clang/test/Driver/hip-toolchain-mllvm.hip
index 33018cc398915b..bedb053b9006cc 100644
--- a/clang/test/Driver/hip-toolchain-mllvm.hip
+++ b/clang/test/Driver/hip-toolchain-mllvm.hip
@@ -30,13 +30,11 @@
 // CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // CHECK-SAME: {{.*}} "-target-cpu" "gfx803"
 // CHECK-SAME: {{.*}} "-mllvm" "-unroll-count=10" {{.*}}
-// CHECK: [[LLD:".*lld.*"]] {{.*}}"-m" "elf64_amdgpu"{{.*}} "-plugin-opt=-unroll-count=10"{{.*}} "-plugin-opt=-inline-threshold=100"
 
 // CHECK: [[CLANG]] "-cc1" "-triple" "amdgcn-amd-amdhsa"
 // CHECK-SAME: "-aux-triple" "x86_64-unknown-linux-gnu"
 // CHECK-SAME: {{.*}} "-target-cpu" "gfx900"
 // CHECK-SAME: {{.*}} "-mllvm" "-unroll-count=10" {{.*}}
-// CHECK: [[LLD:".*lld.*"]] {{.*}} "-plugin-opt=-unroll-count=10"{{.*}} "-plugin-opt=-inline-threshold=100"
 
 // NEG-NOT: {{".*opt"}}
 // NEG-NOT: {{".*llc"}}
diff --git a/clang/test/Driver/hip-toolchain-no-rdc.hip b/clang/test/Driver/hip-toolchain-no-rdc.hip
index 054db261d8e57e..25bd3d60d7cf80 100644
--- a/clang/test/Driver/hip-toolchain-no-rdc.hip
+++ b/clang/test/Driver/hip-toolchain-no-rdc.hip
@@ -36,7 +36,7 @@
 // RUN:   %t/a.o %t/b.o \
 // RUN: 2>&1 | FileCheck -check-prefixes=LKONLY %s
 
-// RUN: %clang -### --target=x86_64-linux-gnu \
+// RUN: %clang -### --target=x86_64-linux-gnu --no-offload-new-driver \
 // RUN:   --offload-arch=amdgcnspirv --offload-arch=gfx900 \
 // RUN:   %s -nogpuinc -nogpulib \
 // RUN: 2>&1 | FileCheck -check-prefixes=AMDGCNSPIRV %s
diff --git a/clang/test/Driver/invalid-offload-options.cpp b/clang/test/Driver/invalid-offload-options.cpp
index 48d5310538a3cf..a0b7f1bdbd3981 100644
--- a/clang/test/Driver/invalid-offload-options.cpp
+++ b/clang/test/Driver/invalid-offload-options.cpp
@@ -26,4 +26,4 @@
 // RUN:   --offload=amdgcn-amd-amdhsa --offload-arch=gfx900 %s \
 // RUN: 2>&1 | FileCheck --check-prefix=OFFLOAD-ARCH-MIX %s
 
-// OFFLOAD-ARCH-MIX: error: option '--offload-arch' cannot be specified with '--offload'
+// OFFLOAD-ARCH-MIX: error: option '--offload' cannot be specified with '--offload-arch'
diff --git a/clang/unittests/Tooling/ToolingTest.cpp b/clang/unittests/Tooling/ToolingTest.cpp
index 8cdfffb54390e0..8a581c0d5e07e2 100644
--- a/clang/unittests/Tooling/ToolingTest.cpp
+++ b/clang/unittests/Tooling/ToolingTest.cpp
@@ -436,9 +436,9 @@ TEST_F(CommandLineExtractorTest, AcceptOffloadingCompile) {
 TEST_F(CommandLineExtractorTest, AcceptOffloadingSyntaxOnly) {
   addFile("test.c", "int main() {}\n");
   const char *Args[] = {
-      "clang",         "-target",   "arm64-apple-macosx11.0.0",
-      "-fsyntax-only", "-x",        "hip",
-      "test.c",        "-nogpulib", "-nogpuinc"};
+      "clang",     "-target",  "arm64-apple-macosx11.0.0", "-fsyntax-only",
+      "-x",        "hip",      "--no-offload-new-driver",  "test.c",
+      "-nogpulib", "-nogpuinc"};
   EXPECT_NE(extractCC1Arguments(Args), nullptr);
 }
 

@b-sumner
Copy link

Breaking any existing HIP feature is a no-go in my opinion. Textures and surfaces are important to some developers.

@arsenm
Copy link
Contributor

arsenm commented Jan 17, 2025

I don't follow the connection to textures

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 17, 2025

Breaking any existing HIP feature is a no-go in my opinion. Textures and surfaces are important to some developers.

Surfaces and textures are being phased out of CUDA as far as I can tell, so I wasn't sure how relevant it was (since we can use --no-offload-new-driver as a workaround for now). CUDA clang doesn't even handle surfaces, and as far as I can tell the old cudaRegisterSurface functions are just totally missing.

Also, this is only in RDC-mode, which I'm just going to guess has almost no uses of surfaces and textures, but I could be wrong.

I don't follow the connection to textures

This codegen here,

// void __cudaRegisterSurface(void **, const struct surfaceReference *,
. The new driver works by putting the information into a "runtime-array" of structs and then creating a single initialization function that loops over those structs. The problem with these calls is that they use way more arguments than currently fit in the struct definition. I could change the struct, but then it would break OpenMP. I could also just change the definition for HIP / CUDA, but ideally they all share the same method for registering things so we can interoperate.

@b-sumner
Copy link

Surfaces and textures are being phased out of CUDA as far as I can tell, so I wasn't sure how relevant it was (since we can use --no-offload-new-driver as a workaround for now). CUDA clang doesn't even handle surfaces, and as far as I can tell the old cudaRegisterSurface functions are just totally missing.

Not phased out. Nvidia is trying to evolve the interfaces but will have to support the old ones for years to come.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 17, 2025

Surfaces and textures are being phased out of CUDA as far as I can tell, so I wasn't sure how relevant it was (since we can use --no-offload-new-driver as a workaround for now). CUDA clang doesn't even handle surfaces, and as far as I can tell the old cudaRegisterSurface functions are just totally missing.

Not phased out. Nvidia is trying to evolve the interfaces but will have to support the old ones for years to come.

If it's a blocker I can try to find a way to work around it. I don't think I found a single runtime test for it, so I figured it's hardly used.

@b-sumner
Copy link

If it's a blocker I can try to find a way to work around it. I don't think I found a single runtime test for it, so I figured it's hardly used.

https://github.com/ROCm/hip-tests/tree/amd-staging/catch/unit/surface

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 17, 2025

I somehow totally forgot that I implemented surfaces and textures last year, it was managed that I was having problems with.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 17, 2025

@yxsamliu Here's the general problem, the __hipRegisterManagedVar call takes the following arguments.

 __hipRegisterManagedVar(void ** handle, char *ManagedVarPtr, char *VarPtr, const char *VarName, size_t Size, unsigned Alignment)

But the struct that we store this information is (inherited from OpenMP) just the following:

struct __tgt_offload_entry {                                                                                                                                                                             
   void    *addr;        
   char    *name;                                                                                                                           
   size_t  size;                                                                                                                   
   int32_t flags;                                                                                                                                                                                              
   int32_t data;                  
};

I'd like to avoid modifying this struct to avoid breaking ABI with OpenMP. Perhaps I should make addr a pointer to a struct and just GEP the two values.

@b-sumner
Copy link

I'd like to avoid modifying this struct to avoid breaking ABI with OpenMP. Perhaps I should make addr a pointer to a struct and just GEP the two values.

Are we trying to jam a square HIP peg into a round OpenMP hole, or are we truly wanting to move to a language neutral base? If the latter, why don't we fix the base if it isn't sufficient?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 17, 2025

I'd like to avoid modifying this struct to avoid breaking ABI with OpenMP. Perhaps I should make addr a pointer to a struct and just GEP the two values.

Are we trying to jam a square HIP peg into a round OpenMP hole, or are we truly wanting to move to a language neutral base? If the latter, why don't we fix the base if it isn't sufficient?

At some point I wanted to update the struct to be more generic, but it was a surprising amount of work without breaking backward compatibility w/ the OpenMP runtime. Though honestly, I forget if we ever even guaranteed that kind of backward compatibility. Long term what I wanted is that instead of using hip_offloading_entries and omp_offloading_entires we'd just have offloading_entries as a section and each entry has a kind in it when we register them.

For now, it's probably easier just to put some indirection in this and pass a struct to the first argument.

@b-sumner
Copy link

For now, it's probably easier just to put some indirection in this and pass a struct to the first argument.

It would be even easier to not make the move. Does "addr" have different meanings in each context where an offload_entry is used? Is this going to confuse the debugger?

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 17, 2025

For now, it's probably easier just to put some indirection in this and pass a struct to the first argument.

It would be even easier to not make the move. Does "addr" have different meanings in each context where an offload_entry is used? Is this going to confuse the debugger?

It's just a struct, what the argument do is determined by an enum, so when given a managed var, we'd just GEP the pointer instead of passing it directly. I probably should work on unifying all of these though, but that's a larger challenge since I'm just trying to get it over the finish line right now.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Jan 18, 2025

Fixed the managed stuff in #123437, should be good enough for the foreseeable future.

@jhuber6 jhuber6 force-pushed the HIPNewDriver branch 2 times, most recently from bd32631 to f76dd10 Compare July 25, 2025 21:13
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the old driver isn't being removed, the tests should have run lines with the new and old version

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we can remove it later, mostly was just trying to avoid duplicating all the tests. I guess I could just force this to use the old driver since this functionality is tested in other places for the new driver.

Side note, I think the no-RDC mode is still broken after the changes Yaxunl made so I need to investigate those.

@jhuber6 jhuber6 force-pushed the HIPNewDriver branch 2 times, most recently from 6a1437d to 54917a3 Compare July 28, 2025 19:28
jhuber6 referenced this pull request Jul 28, 2025
This fixes the issue of rdc linking static libraries with device code

CHIP-SPV/chipStar#984

---------

Co-authored-by: Henry Linjamäki <[email protected]>
@jhuber6
Copy link
Contributor Author

jhuber6 commented Jul 28, 2025

@linehill Right now the HIPSPV tests I passed --no-offload-new-driver. I'm not sure what's required to update those as well since they seem to use some custom plugin and linker behavior. Can you take a look and tell me what you think needs to be changed because this right now will likely break your setup and I don't know how to test it or use it.

@linehill
Copy link
Contributor

Sorry for the delayed response - I have been on a long vacation.

I’m not familiar with the new offload driver internals to give pointers on what needs to be changed and where for retaining the custom plugin and linker behavior needed by the chipStar nor I can devote time to look at it currently.

Perhaps, the HIPSPV toolchain could default to the old offload driver in the meantime until I or someone else takes a look at it.

cc @pjaaskel and @pvelesko for awareness.

@pvelesko
Copy link
Contributor

thanks for the ping, I will review this PR by the end of next week. @jhuber6 @linehill

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 14, 2025

Sorry for the delayed response - I have been on a long vacation.

I’m not familiar with the new offload driver internals to give pointers on what needs to be changed and where for retaining the custom plugin and linker behavior needed by the chipStar nor I can devote time to look at it currently.

Perhaps, the HIPSPV toolchain could default to the old offload driver in the meantime until I or someone else takes a look at it.

cc @pjaaskel and @pvelesko for awareness.

The steps are roughly outlined in https://clang.llvm.org/docs/OffloadingDesign.html, though it hasn't been updated for the other targets. Generally it means you'll need to do some more plumbing. The problem with HIPSPV here is that it still uses the OFK_HIP type, which means it's not easy to determine whether or not to use the new driver from the compiler stages, and even so I'd really prefer to be able to delete all the old handling soon.

More or less, the main change is that all offloading targets go through the same function. For non-RDC compilation it's not too different, but for RDC compilation everything gets put into a fat binary using a custom binary format inside LLVM. This then gets passed to the linker wrapper, whose job is to extract those objects and generate a linker job. So, what you need is for clang --target=<whatever your triple> a b c to work.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 14, 2025

I'd suggest you just try doing --offload-new-driver with -### and -v and see what doesn't work.

@linehill
Copy link
Contributor

I'd suggest you just try doing --offload-new-driver with -### and -v and see what doesn't work.

Running the hipspv-toolchain.hip case with the --offload-new-driver option reveals that it is missing steps for running custom pass plugin on the final device bitcode and SPIR-V translation*, crucial for chipStar.

*: The translation step could be replaced with the SPIR-V backend if it is in good state for chipStar.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Aug 15, 2025

I'd suggest you just try doing --offload-new-driver with -### and -v and see what doesn't work.

Running the hipspv-toolchain.hip case with the --offload-new-driver option reveals that it is missing steps for running custom pass plugin on the final device bitcode and SPIR-V translation*, crucial for chipStar.

*: The translation step could be replaced with the SPIR-V backend if it is in good state for chipStar.

Are you able to make PRs for this? The main difference is that when we do rdc code it goes through the linker wrapper which will then invoke clang --target=spirv64 file1 file2. We have a list of flags to pass through where we create the linker wrapper job.

Summary:
This patch matches CUDA, moving the HIP compilation jobs to the new
driver by default. The old behavior will return with
`--no-offload-new-driver`. The main difference is that objects compiled
with the old driver are no longer compatible and will need to be
recompiled or the old driver used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants